Skip to content

Conversation

@maximizemaxwell
Copy link
Contributor

@maximizemaxwell maximizemaxwell commented Sep 6, 2025

Part of issue

#3057

What does this PR do?

Implement snac features and integration

Summary

  • Implement SNAC (Multi-Scale Neural Audio Codec) integration for
    Text-to-Speech applications
  • Add comprehensive TTS utilities and configuration presets for speech
    synthesis
  • Provide example demonstrating Qwen + SNAC TTS pipeline

Changes Made

  • New module: snac_tts_integration.rs with TTS-optimized SNAC codec
    wrapper
  • Enhanced SNAC model: Added TTS-specific methods (encode_for_tts,
    decode_from_tts_tokens, batch processing)
  • Config presets: Added default_tts(), high_quality_tts(), fast_tts()
    configurations
  • Utility functions: Memory estimation, token validation, voice embedding
    creation
  • Example implementation: qwen_snac_tts_example.rs showing complete TTS
    pipeline

Key Features

  • Multiple quality presets: 24kHz speech, 32kHz general, fast 16kHz
    options
  • TTS pipeline abstraction: SnacTtsPipeline for easy integration with
    language models
  • Batch processing support: Efficient handling of multiple audio streams
  • Memory optimization: Token padding, truncation, and memory estimation
    utilities
  • Voice cloning support: Reference audio embedding extraction

@maximizemaxwell maximizemaxwell marked this pull request as draft September 6, 2025 13:06
@lucasjinreal
Copy link

Hi, does it about to work? Seems we can support SparkTTS and VovyTTs once this workable.

@maximizemaxwell maximizemaxwell marked this pull request as ready for review September 16, 2025 11:34
@lucasjinreal
Copy link

@maximizemaxwell Hi, would like add some checkpoint conversion docs, I'd like verify it's result is normal or not, once it done, we can consider merging SNAC support and enable several SOTA TTS models which used snac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants